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ABSTRACT 


This article is an exploratory analysis and comparison of the demographic distributions of data collected from the 2016 
New Coder Survey, with that obtained from the Integrated Postsecondary Education Data System (IPEDS). In comparing 
the data sets, the findings suggest that overall females were more likely to engage in online self-paced coding education, 
particularly when they had no background or previous study in an IT discipline. This contrasted strongly with females 
having an existing IT qualification. When looking at ethnicity, the research identified that those students who identify as 
an ethnic minority were more likely to undertake formal tertiary education in IT, rather than engage in online coding 
study. The research also confirmed that the average age was higher, and diversity of age groups was larger for those 
undertaking online study, when compared with those undertaking formal tertiary study. The practical implications of this 
analysis to diversity in Information Technology disciplines such as computer science, and more broadly with 
STEM-related disciplines are discussed. 
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1. INTRODUCTION 


If effectively managed in the workplace, diversity can help encourage creativity and innovation (Ostergaard 
et al 2011), improve team performance (McLeod et al 1996), and identify new product and market 
opportunities (Robbins 2004, Bourgeois 2018). Therefore, it is important that the education sector embraces 
diversity to produce talents who are committed to diversity (Bial 2016, Bourgeois 2008). Aside from 
economic benefits, diversity in the workplace and education is also considered a key aspect of social justice 
(Sue 2008, Ayers et al. 2008). 

However, despite continued efforts in equal opportunity, we continue to witness underrepresentation of 
minority groups in various subject areas and, in particular, computer science. For example, in the US, the 
percentage of females awarded with a bachelor’s degree in computer science increased from 13.6% in 
1970-1971 to 37% in 1983-1984 but gradually declined to 18% in 2010-2011 (Kendall 2017). Both 
enrolment and completion rates in computer science are lower for females than males (Miliszewska et al 
2006). In terms of ethnicity, Taylor and Ladner (2012) show that there is little improvement between 2000 
and 2009 in the problem of underrepresentation of some ethnic groups (African Americans, Hispanics, and 
American Indian or Alaska Natives) in the field of computing. A more recent study shows that ethnicity and 
gender gaps continue to persist in computer science education (Google Inc. and Gallup Inc. 2016). Similar 
problems are observed in other parts of the world including the UK, New Zealand, Australia and South Korea 
(Glick 2017, UNESCO 2017). 

Ageism is another major diversity concern in the IT sector. Castillo (2017) reports that many over 40 find 
it hard to find a job in the industry. More than 40% of IT workers worry about losing their jobs because of 
age (Sumagaysay 2017). Several real examples of ageism in recruitment in video game development are 
outlined in Serrels (2018). In fact, many tech giants such as IBM, Amazon, Facebook and Intel are now 
facing charges or being investigated for ageism (Mcintyre 2018, Claburn 2018, Wells 2018). 

The purpose of this study is to investigate whether the use of online learning can reduce some of the 
diversity gaps compared to formal undergraduate education with a focus on gender, ethnicity and age. In the 
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following sections, we shall (1) review the relevant literature to discuss how online learning can potentially 
address diverse student needs, (2) describe our research design and methodology, (3) analyze our data, 
(4) discuss the practical implications of our data analysis and (5) summarize our findings and identify future 
research directions. 


2. LITERATURE REVIEW 


2.1 Addressing Diversity Gaps in STEM 


Industry reports and the academic literature have offered a number of explanations for the persistence in 
diversity gaps observed in computing, including insufficient recruitment and retention efforts targeting 
minority groups (Whittaker and Montgomery 2013); insufficient diversity in faculty members (Towns 2010); 
subtle discrimination in the workplace and in education (Marder 2012, Moss-Racusin et al 2012); and 
insufficient incentive for diversity commitment among faculty members (Whittaker and Montgomery 2013). 

To address the persistent diversity gaps, organizations have dedicated resources to develop interest among 
underrepresented minorities at the high school level (Bystydzienski et al 2015, Cheryan et al 2015). 
E-mentoring has been used to provide underrepresented groups electronic access to mentors who have similar 
backgrounds in other institutions (Wadia-Fascetto and Leventman 2000, Blake-Beard et al 2011). It is also 
recommended that tertiary institutions cultivate commitment to diversity by formalization of policies, 
engagement and accountability (Whittaker and Montgomery 2013). Implicit bias training has also been 
shown to improve attitudes toward women in STEM (Jackson et al 2014). Various learning methods and 
interventions have also been found to improve performance disparities among students of different 
backgrounds including pair programming (McDowell et al. 2006), value affirmation (Miyake et al 2010), 
structured course design and active learning (Haak et al 2011). 


2.2 Online Learning and Diversity 


Baker et al (2018) conducted a field experiment on an online learning platform where each comment was 
assigned a student name connoting a specific race and gender and found that instructors were 94% more 
likely to respond to White male students. This result suggests that hidden biases exist in even in the online 
learning environment. On the other hand, Grella and Meinel (2016) found that although only 16% of those 
who take part in learning STEM in MOOCs are female, success completion rates are about the same for 
female (25%) and male (26%) learners. Furthermore, discussion forum participation, which increases the 
likelihood of successful completion, is greater among female than male learners. A high level of involvement 
among female students is also reported in online learning of non-STEM subjects (Cuadrado-Garcia, et al 
2010). Drew et al (2015) show that a hybrid online 2+2 STEM program increases participation of 
underrepresented minority students as compared to a similar traditional face-to-face 2+2 program. Together, 
these findings suggest that online learning can potentially be used to resolve some issues that lead to diversity 
gaps in STEM education. 

Wladis et al (2015) show that, compared to face-to-face STEM courses, Black and Hispanic students are 
significantly underrepresented in online STEM courses. However, females and students with non-traditional 
student risk factors (such as delayed enrollment, no high school diploma, part-time enrolment, financially 
independent, have dependents, single-parent status, and working full-time) are significantly overrepresented 
in online STEM courses. This suggests that the diversity implications of online learning are actually quite 
complex and require further research attention. 


2.3 Research Question 


The purpose of this study is to analyze the demographic distribution of students who learn to code on an 
online platform compared to that of formal undergraduate education. Specifically, our research question is: 
Are there differences in the demographics of students learning to code online and students acquiring a 
formal IT-related degree in terms of gender, ethnic minority status and age? The answer to this question will 
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allow us to evaluate the diversity implications of online learning of coding, and shed some light on why 
online learning affects different diversity gaps differently. 


3. RESEARCH METHODOLOGY 


To compare the demographic distributions of online learning and formal undergraduate education, we make 
use of two publicly available data sources: (1) the 2016 New Coder Survey and (2) the Integrated 
Postsecondary Education Data System (IPEDS). 

The 2016 New Coder Survey was predominantly completed by online self-paced students of Free Code 
Camp (FCC) and CodeNewbie (CN). FCC is a self-education portal for people who are interested in software 
development and in learning to code, particularly in web-development languages such as HTML, CSS, 
JavaScript and JQuery among others. CN is an online community focussed on the support and education of 
users who are interested in coding. The survey asked up to 43 questions (depending on respondents’ answers) 
covering respondents’ learning approach as well as demographic and socio-economic data. 15,620 
respondents completed the survey; of these respondents, 6,265 were from the U.S. The survey was 
completely anonymous, and all questions were non-compulsory. The data can be downloaded from: 
https://github.com/freeCodeCamp/2016-new-coder-survey. 

IPEDS is a system that contains survey data conducted annually by the U.S. Department of Education’s 
National Center for Education Statistics. The surveys collect data such as enrolments, program completions, 
graduation rates, faculty and staff, finances, institutional prices and student financial data from institutions 
that participate in federal student aid programs. The data can be downloaded from 
https://nces.ed.gov/ipeds/use-the-data. To ensure comparability, non-U.S. data from the New Coder Survey 
are excluded when comparing the demographic distributions between online learning and formal 
undergraduate education. Since the IPEDS data set does not provide data on computer science enrolment 
broken down by age and ethnicity, we will compare our New Coder Survey data with the completions data 
from the IPEDS data set, specifically, degrees awarded under CIP Code 11: Computer and Information 
Sciences and Support Services in 2016.! 

The 2016 New Coder Survey was analysed and compared with general findings from related research 
focussed on formal education, as a part of the lead author’s master’s dissertation (Lane, 2017). This paper 
endeavours to refocus the survey analysis, by contrasting with comparable survey data from the formal 
education domain. 


4. DATA ANALYSIS 


4.1 Educational Background of Respondents from the New Coder Survey 


Before comparing online learning and formal education, we provide some descriptive statistics on the 
education background of the respondents of the New Coder Survey (online learning) in Table 1. 


Table 1. Highest Education Attained by Respondents from the New Coder Survey 


Highest Education Count (n) Percentage (%) 
No high school (secondary school) 65 1.0375% 
Some high school 194 3.0966% 
High school diploma or equivalent (GED) 325 5.1875% 
Some college credit, no degree 1304 20.8140% 
Trade, technical or vocational training 134 2.1389% 
Associate’s degree 444 7.0870% 
Bachelor’s degree 2782 44.4054% 


' We use the latest available data set at the time of writing, i.e., the provisional release data for collected in the academic year 2016-2017. 
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Master’s degree (non-professional) 633 10.1038% 
Professional degree (MBA, MD, JD, etc.) 279 4.4533% 
Ph.D. 79 1.2610% 
Missing value 25 0.3990% 
Total 6265 100% 


As shown, more than half of the respondents from the New Coder Survey actually own a higher degree. 
This is consistent with Ho et al (2015) and Schmid et al (2015) who found that the majority of massive open 
online course (MOOC) students are college graduates. 


4.2 Online Learning vs. Formal Education: Gender 


The gender distributions from the New Coder Survey (online learning) and IPEDS (formal education) are 
shown in Table 2A. 
Table 2A. Gender Distributions 


ander Online Learning | Formal Education (Computer Science Only) 
n % n % 
Male 4,369 | 69.737% 410,508 76.272% 
Female 1,781 | 28.428% 127,707 23.728% 
Other 94 1.500% 0 0% 
Missing value | 21 0.335% 0 0% 
Total 6,265 | 100% 538,215 100% 


If we focus only on the two major groups (i.e., male and female) and perform a z-test to compare the two 
population proportion, we find that the proportion of females in online learning is significantly different from 
the proportion of females in formal education (z = 9.2609, p < 0.001). 

We noted in Section 4.1 that the majority of our subjects are degree holders. To assess the democratizing 
effect of online learning, we distinguish between those who majored in an IT-related subject and those who 
majored in a non-IT related subject. The New Coder Survey asked respondents to specify their major. Of the 
U.S. sample, a total of 4,158 answered the questions, giving a total of 426 distinct majors specified (e.g., 
Accounting, Public Health, Women’s Studies, etc.). Two of the authors independently classified each of the 
unique majors into “IT-Related” and “non-IT related” based on the name of the major. Out of the 426 majors, 
there were 22 discrepancies. Overall, the level of agreement is 94.84%. The Cohen's kappa coefficient is 
94.81%, suggested a high inter-rater reliability. For the 22 discrepancies, a third author was asked to make 


the final decision. 
Table 2B. Online Learning Gender Distributions (with and without IT Background) 


Online Learning (With IT Background) | Online Learning (With no IT Background) 
Gender 
n % n % 
Male 891 78.989% 1859 61.353% 
Female 225 19.947% 1119 36.931% 
Other 8 0.709% 45 1.485% 
Missing value 4 0.355% 7 0.231% 
Total 1128 100% 3030 100% 


Focusing only on males and females, z-tests show that the percentage of females among those with an IT 
background is significantly lower than the overall average of online learning (1781/(1781+4369) = 28.96%) 
(z = -6.6735, p < 0.001) and the percentage of females among those without an IT background is significantly 
higher than the overall average of online learning (z = 9.6740, p < 0.001). It seems that the democratizing 
effect of online learning is stronger among those who do not have an IT background. It is also interesting to 
note that females who already have an IT background are less likely to participate in online learning of 
coding than non-IT counterparts. In fact, participation rate of females with an IT background in online 
learning is even lower than the participation rate of females in formation computer science education 


(z = -2.9851, p = 0.003). 
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4.3 Online Learning vs. Formal Education: Ethnic Minority 


The New Coder Survey directly asked whether the respondent is an ethnic minority. The IPEDS divided 
students into specific ethnic groups (white, American Indian or Alaska native, black or African American, 
Hispanic or Latino, native Hawaiian or other Pacific islander, nonresident alien, race/ethnic unknown, two or 
more races). Here, we group all groups other than white as minority. 


Table 3A. Distributions of Ethnic Minority Status 


Online Formal Education (Computer Science 
Is Ethnic Minority? Learning Only) 
n % n % 

No 4284 | 68.380% 245,463 45.607% 
Yes | American Indian or Alaska Native 2,736 0.508% 

Asian 45,366 8.429% 

Black or African American 51,612 9.589% 

Hispanic or Latino 52,848 9.819% 

Native Hawaiian or Other Pacific 1936 | 30.902% 0.249% 

Islander 1,338 

Nonresident Alien 98,667 18.332% 

Race/Ethnic Unknown 26,700 4.961% 

Two or More Races 13,485 2.506% 
Missing value 45 0.718% 0 0% 
Total 6265 100% 538,215 100% 


If we exclude the missing values from analysis, take the ethnic minority status as a binary variable and 
perform a z-test to compare the proportions of ethnic minorities, we find that the proportion of ethnic 
minorities in online learning is significantly different from the proportion of ethnic minorities in formal 
education (z = 36.8794, p < 0.001). 


Table 3B. Online Learning Ethnic Minority Distributions (with and without IT Background) 


; Online Learning (With IT Online Learning (With no IT 
Is Ethnic 
Pa Background) Background) 
Minority? 
n % n % 

No 796 70.657% 2114 69.769% 
Yes 325 28.812% 898 29.637% 
Missing value 7 0.621% 18 0.594% 
Total 1228 100% 3030 100% 


From Table 3B, we can see that an IT background does not seem to have a significant effect on ethnic 
diversity in online learning (z = 0.4988, p = 0.6179). Even when we include only those who do not have an 
IT background in our analysis, online learning still seems to discourage ethnic minorities compared to formal 
education of computer science (z = 26.7654, p < 0.001). 


4.4 Online Learning vs. Formal Education: Age 


The age distribution for students majoring in computer science is not available in the IPEDS data set. 
However, we have the age distribution for all students enrolled in U.S. tertiary institutions as shown in Table 
4A. Comparing the age distributions of online learning and formal education, we find that the largest age 
group is 25-34 for online learning and 18-21 for formal education, which is not surprising since we have 
earlier noted that the majority of the learners from online learning are degree holders. Excluding the missing 
values and the unknown age category, a y” test on the percentage distributions in Table 4A shows that the 
distributions are significantly different (775 = 5,783, p < 0.001). 
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Figure 1A graphically compares the two distributions. As shown, starting from the 25-29 age group, the 
bars for online learning are consistently taller than those for formal education. This observation seems to 
suggest that online learning can help encourage age diversity in computing. 


Table 4A. Age Distributions 


Online Learning Formal Education 
Age 

n % n % 
Age under 18 243 3.879% 1,880,218 5.848% 
Age 18-19 174 2.777% 7,311,886 22.742% 
Age 20-21 225 3.591% 6,795,868 21.137% 
Age 22-24 788 12.578% 5,373,464 16.713% 
Age 25-29 1702 27.167% 4,354,772 13.545% 
Age 30-34 1258 20.080% 2,269,636 7.059% 
Age 35-39 681 10.870% 1,452,208 4.517% 
Age 40-49 734 11.716% 1,701,600 5.292% 
Age 50-64 347 5.539% 875,362 2.723% 
Age 65 and over 25 0.399% 101,020 0.314% 
Age unknown 0 0% 35,526 0.110% 
Missing value 88 1.405% 0 0% 
Total 6265 100.000% 32,151,560 100.000% 


Online Learning vs Formal Education: Age Distribution 
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Figure 1A. Age Distributions of Online Learning and Formal Education 


To evaluate the effect of an IT background on age diversity in online learning, we produce Table 4B and 
Figure 1B. Referring to the dark and light bars in Figure 1B, the difference between the age distributions 
seems to be smaller than those between online learning and formal education. However, it is still statistically 
significant (75 = 38.78, p < 0.001). Among those with no IT background we observe a larger proportion of 
learners between 25 and 34 but a smaller proportion of learners between 35 and 49. Overall, an independent 
sample t-test reveals that the mean age between those with and without an IT background is not statistically 
significant (t = 1.1773, p = 0.2392). Therefore, it is hard to say whether online learning has a greater age 
diversity implication among people with or without an IT background. 
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In Figure 1B, we can see that both online learning groups have an age distribution that is significantly 
different from that of formal education (online learning with IT background: 75 = 1318, p < 0.001; online 
learning without IT background: 7°» = 4274, p < 0.001). In other words, a greater age diversity is observed in 
online learning regardless of IT background. 


Table 4B. Online Learning Age Distributions (with and without IT Background) 


Online Learning (With IT Online Learning (With no IT 
Age Background) Background) 
n % n % 
Age under 18 1 0.0887% 1 0.0330% 
Age 18-19 6 0.5319% 2 0.0660% 
Age 20-21 25 2.2163% 25 0.8251% 
Age 22-24 164 14.5390% 361 11.9142% 
Age 25-29 327 28.9894% 941 31.0561% 
Age 30-34 221 19.5922% 738 24.3564% 
Age 35-39 143 12.6773% 367 12.1122% 
Age 40-49 149 13.2092% 363 11.9802% 
Age 50-64 67 5.9397% 183 6.0396% 
AEC Oy ane 0.3546% 0.5611% 
over 4 17 
Missing value 21 1.8617% 32 1.0561% 
Total 1128 100% 2998 100% 
Online Learning with and without IT Background: Age 
Distribution 
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20.000% Online Learning with IT 
Background 
15.000% 
Online Learning with No IT 
10.000% Background 
5.000% j [ = Formal Education 
0.000% l l — 


Age Age Age Age Age Age Age Age Age Age 
under 18-19 20-21 22-24 25-29 30-34 35-39 40-49 50-64 65 
18 and 

over 


Figure 1B. Age Distributions of Online Learning Respondents with and without IT Background 


5. PRACTICAL IMPLICATIONS 


Our analysis has shown that online learning can potentially be used to promote a greater diversity in 
computing. Females without an IT background and people over 25 may find online learning of coding more 
accessible than formal education, which is consistent with Wladis et al (2015) and Johnson et al (2015). 
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However, online learning seems to have a negative impact on diversity in terms of ethnicity, again consistent 
with Wladis et al (2015). 

One explanation for the negative impact on ethnic diversity is that the possible language barrier that some 
learners of ethnic minority may face in a predominantly English-speaking learning platform where social 
cues that assist interpretation are generally lacking. Johnson et al (2015) reported a significant lower 
proportion of speakers of English as a foreign language in online learning compared to on-campus university 
students, and argue that students who are in the process of enculturation may prefer to acquire language 
proficiency and cultural familiarity through on-campus education. Google Inc. and Gallup Inc. (2016) 
suggest ethnic minorities face both social and structural barriers in access and exposure to computer science. 
It is important for us to understand the drivers behind the observed differences between online learning and 
traditional face-to-face learning, and design online learning platforms that promote diversity. 

It is also important to be aware that the drivers that affect diversity may also change over time. For 
example, earlier studies have suggested online learning may put female students in a disadvantaged position 
because they tend to have lower computer self-efficacy (Shashaani 1997, Thompson and Lynch 2003) and 
prefer face-to-face communication (Anderson 1997). However, the gender difference in computer 
self-efficacy among digital natives these days has mostly disappeared (Price 2006) and hence computer 
self-efficacy as a barrier for female students to adopt online learning is no longer a valid argument. In fact, 
more recent studies have shown that female students tend to benefit more than their male counterparts from 
social interaction within the learning platform (Johnson 2011). Our findings also show that female 
participation in online learning of coding is higher than that of formal computer science education. However, 
our data sets are not capable of validating the hypothesis that opportunity to socially interact on an online 
platform increases the female participation rate. 

We found that the female participation rate in online learning of coding among those with an IT 
background is significantly lower than that in formal education of computer science, suggesting that female 
computer science graduates are less likely to upgrade their coding skills online. This finding is in line with 
the industry report that female graduates of STEM are less likely to persist in STEM jobs due to various 
reasons such as family constraints (Glass et al 2013) and dissatisfaction with pay and promotion (Hunt 2016). 
Further investigation into ways to improve retention of females in computer science in the job market is 
recommended. 


6. CONCLUSION 


Lack of diversity in Information Technology disciplines such as computer science, and more broadly with 
STEM-related disciplines is a common problem in many societies. This study compares of the demographic 
distributions of data collected from the 2016 New Coder Survey with that obtained from the Integrated 
Postsecondary Education Data System (IPEDS). The findings suggest that female and mature learners were 
more likely to engage in online self-paced coding education, whereas those who identify as an ethnic 
minority were less likely to undertake online coding study. The practical implications of this analysis are 
reflected in the opportunities that it suggests. 

Female participation in Information Technology-related disciplines such as computer science falls well 
behind male participation. Those institutions looking to increase female participation would be encouraged to 
provide a more supportive environment to cultivate female interest. The research points to a greater 
percentage of females seeking the comfort of self-paced online learning when looking to engage in computer 
science as a novice. 

A similar observation can be made regarding mature aged students. If tertiary institutions are looking to 
expand their offering, rather than looking to markets further afield, they need only consider marketing to, and 
providing a supportive environment for older students looking to return to tertiary study, or to attempt it for 
the first time, as a means to upskill. 

To encourage participation in online study by those who identify as an ethnic minority, more may need to 
be done to provide lessons or other external support in languages other than English. Where formal tertiary 
study has the benefit of fostering communication within student groups, online learning can possibly be more 
difficult for a non-native speaker of the majority language. 
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Study into online learners in this new education paradigm of Massive Open Online Courses is in its 
infancy and this research seeks to highlight some similarities and differences within the demographics of 
online and traditional tertiary courses. 
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